scholarly journals Optimally Deceiving a Learning Leader in Stackelberg Games

2021 ◽  
Vol 72 ◽  
pp. 507-531
Author(s):  
Georgios Birmpas ◽  
Jiarui Gan ◽  
Alexandros Hollender ◽  
Francisco J. Marmolejo-Cossío ◽  
Ninad Rajgopal ◽  
...  

Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the follower’s deception and the leader’s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.

2019 ◽  
Vol 9 (16) ◽  
pp. 3348 ◽  
Author(s):  
Zhibin Feng ◽  
Guochun Ren ◽  
Jin Chen ◽  
Chaohui Chen ◽  
Xiaoqin Yang ◽  
...  

In this paper, we study joint relay selection and the power control optimization problem in an anti-jamming relay communication system. Considering the hierarchical competitive relationship between a user and jammer, we formulate the anti-jamming problem as a Stackelberg game. From the perspective of game, the user selects relay and power strategy firstly which acts as the leader, while the jammer chooses power strategy then that acts as follower. Moreover, we prove the existence of Stackelberg equilibrium. Based on the Q-learning algorithm and multi-armed bandit method, a hierarchical joint optimization algorithm is proposed. Simulation results show the user’s strategy selection probability and the jammer’s regret. We compare the user’s and jammer’s utility under the proposed algorithm with a random selection algorithm to verify the algorithm’s superiority. Moreover, the influence of feedback error and eavesdropping error on utility is analyzed.


Author(s):  
Tingting Yang ◽  
Kailing Yao ◽  
Youming Sun ◽  
Fei Song ◽  
Yang Yang ◽  
...  

Unmanned Aerial Vehicles (UAVs) severing as the relay is an effective technology method to extend the coverage. It can also alleviate the congestion and increase the throughput, especially applied in UAV networks. However, since the energy of UAVs is limited and the resources in UAV networks are scarce, how to optimize the network delay performance under these constraints should be well investigated. Besides, the relationship among different resources, e.g. power and bandwidth, is coupled which makes the optimization more complex. This article investigates the problem of joint power and bandwidth allocation in UAV backhaul networks, which considers both the delay performance and the resource utilization efficiency. Considering the heterogeneous locations characteristics of different UAVs, we formulate the optimization problem as a Stackelberg game. The relay UAV acts as the leader and extended UAVs act as followers. Their utility functions take both the delay durance and the resource consumption into account. To capture the competitive relationship among followers, the sub-game is proved to be an exact potential game and exists Nash equilibriums (NE). The Stackelberg Equilibrium (SE) is proved afterwards. We utilize a hierarchical learning algorithm (HLA) to find out the best resource allocation strategies, which also reduces the computational complexity. Simulation results demonstrate the effectiveness of the proposed method.


2021 ◽  
Author(s):  
Zikai Feng ◽  
Yuanyuan Wu ◽  
Mengxing Huang ◽  
Di Wu

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).


Author(s):  
Kamel Meziani ◽  
Fazia RAHMOUNE ◽  
Mohammed Said RADJEF

A Stackelberg game is used to study the service pricing and the strategic behavior of customers in an unreliable and totally unobservable M/M/1 queue under a reward-cost structure. At the first stage, the server manager, acting as a leader, chooses a service price and, at the second stage, a customer, arriving at the system and acting as a follower, chooses to join the system or an outside opportunity, knowing only the service price imposed by the server manager and the system parameters. We show that the constructed game admits an equilibrium and we give explicit forms of server manager and customers equilibrium behavioral strategies.  The results of the proposed model show that the assumption that customers are risk-neutral is fundamental for the standard approach usually used. Moreover, we determine the socially optimal price that maximizes the social welfare and we compare it to the Stackelberg equilibrium. We illustrate, by numerical examples, the effect of some system parameters on the equilibrium service price and the revenue of the server manager.


Energies ◽  
2019 ◽  
Vol 12 (2) ◽  
pp. 325 ◽  
Author(s):  
Shijun Chen ◽  
Huwei Chen ◽  
Shanhe Jiang

Electric vehicles (EVs) are designed to improve the efficiency of energy and prevent the environment from being polluted, when they are widely and reasonably used in the transport system. However, due to the feature of EV’s batteries, the charging problem plays an important role in the application of EVs. Fortunately, with the help of advanced technologies, charging stations powered by smart grid operators (SGOs) can easily and conveniently solve the problems and supply charging service to EV users. In this paper, we consider that EVs will be charged by charging station operators (CSOs) in heterogeneous networks (Hetnet), through which they can exchange the information with each other. Considering the trading relationship among EV users, CSOs, and SGOs, we design their own utility functions in Hetnet, where the demand uncertainty is taken into account. In order to maximize the profits, we formulate this charging problem as a four-stage Stackelberg game, through which the optimal strategy is studied and analyzed. In the Stackelberg game model, we theoretically prove and discuss the existence and uniqueness of the Stackelberg equilibrium (SE). Using the proposed iterative algorithm, the optimal solution can be obtained in the optimization problem. The performance of the strategy is shown in the simulation results. It is shown that the simulation results confirm the efficiency of the model in Hetnet.


Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 830
Author(s):  
Seokho Kang

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.


Author(s):  
Alain Jean-Marie ◽  
Mabel Tidball ◽  
Víctor Bucarey López

We consider a discrete-time, infinite-horizon dynamic game of groundwater extraction. A Water Agency charges an extraction cost to water users and controls the marginal extraction cost so that it depends not only on the level of groundwater but also on total water extraction (through a parameter [Formula: see text] that represents the degree of strategic interactions between water users) and on rainfall (through parameter [Formula: see text]). The water users are selfish and myopic, and the goal of the agency is to give them incentives so as to improve their total discounted welfare. We look at this problem in several situations. In the first situation, the parameters [Formula: see text] and [Formula: see text] are considered to be fixed over time. The first result shows that when the Water Agency is patient (the discount factor tends to 1), the optimal marginal extraction cost asks for strategic interactions between agents. The contrary holds for a discount factor near 0. In a second situation, we look at the dynamic Stackelberg game where the Agency decides at each time what cost parameter they must announce. We study theoretically and numerically the solution to this problem. Simulations illustrate the possibility that threshold policies are good candidates for optimal policies.


2020 ◽  
Vol 13 ◽  
pp. 8-23
Author(s):  
Movlatkhan T. Agieva ◽  
◽  
Olga I. Gorbaneva ◽  

We consider a dynamic Stackelberg game theoretic model of the coordination of social and private interests (SPICE-model) of resource allocation in marketing networks. The dynamics of controlled system describes an interaction of the members of a target audience (basic agents) that leads to a change of their opinions (cost of buying the goods and services of firms competing on a market). An interaction of the firms (influence agents) is formalized as their differential game in strategic form. The payoff functional of each firm includes two terms: the summary opinion of the basic agents with consideration of their marketing costs (a common interest of all firms), and the income from investments in a private activity. The latter income is described by a linear function. The firms exert their influence not to all basic agents but only to the members of strong subgroups of the influence digraph (opinion leaders). The opinion leaders determine the stable final opinions of all members of the target audience. A coordinating principal determines the firms' marketing budgets and maximizes the summary opinion of the basic agents with consideration of the allocated resources. The Nash equilibrium in the game of influence agents and the Stackelberg equilibrium in a general hierarchical game of the principal with them are found. It is proved that the value of opinion of a basic agent is the same for all influence agents and the principal. It is also proved that the influence agents assign less resources for the marketing efforts than the principal would like.


Author(s):  
Yang Chen ◽  
Xiao Kou ◽  
Mohammed Olama ◽  
Helia Zandi ◽  
Chenang Liu ◽  
...  

Abstract Grid integration of the increasing distributed energy resources could be challenging in terms of new infrastructure investment, power grid stability, etc. To resolve more renewables locally and reduce the need for extensive electricity transmission, a community energy transaction market is assumed with market operator as the leader whose responsibility is to generate local energy prices and clear the energy transaction payment among the prosumers (followers). The leader and multi-followers have competitive objectives of revenue maximization and operational cost minimization. This non-cooperative leader-follower (Stackelberg) game is formulated using a bi-level optimization framework, where a novel modular pump hydro storage technology (GLIDES system) is set as an upper level market operator, and the lower level prosumers are nearby commercial buildings. The best responses of the lower level model could be derived by necessary optimality conditions, and thus the bi-level model could be transformed into single level optimization model via replacing the lower level model by its Karush-Kuhn-Tucker (KKT) necessary conditions. Several experiments have been designed to compare the local energy transaction behavior and profit distribution with the different demand response levels and different local price structures. The experimental results indicate that the lower level prosumers could benefit the most when local buying and selling prices are equal, while maximum revenue potential for the upper level agent could be reached with non-equal trading prices.


2014 ◽  
Vol 5 (2) ◽  
pp. 61-74 ◽  
Author(s):  
Fatemeh Afghah ◽  
Abolfazl Razi

In this paper, a novel property-right spectrum leasing solution based on Stackelberg game is proposed for Cognitive Radio Networks (CRN), where part of the secondary users present probabilistic dishonest behavior. In this model, the Primary User (PU) as the spectrum owner allows the Secondary User (SU) to access the shared spectrum for a fraction of time in exchange for providing cooperative relaying service by the SU. A reputation based mechanism is proposed that enables the PU to monitor the cooperative behavior of the SUs and restrict its search space at each time slot to the secondary users that do not present dishonest behavior in the proceeding time slots. The proposed reputation-based solution outperforms the classical Stackelberg games from both primary and reliable secondary users' perspectives. This novel method of filtering out unreliable users increases the PU's expected utility over consecutive time slots and also encourages the SUs to follow the game rule.


Sign in / Sign up

Export Citation Format

Share Document