scholarly journals Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games

Author(s):  
Chun Kai Ling ◽  
Fei Fang ◽  
J. Zico Kolter

With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions. Although a recent work provides a powerful differentiable end-to-end learning frameworks which embed a game solver within a deep-learning framework, allowing unknown game parameters to be learned via backpropagation, this framework faces significant limitations when applied to boundedly rational human agents and large scale problems, leading to poor practicality. In this paper, we address these limitations and propose a framework that is applicable for more practical settings. First, seeking to learn the rationality of human agents in complex two-player zero-sum games, we draw upon well-known ideas in decision theory to obtain a concise and interpretable agent behavior model, and derive solvers and gradients for end-to-end learning. Second, to scale up to large, real-world scenarios, we propose an efficient first-order primal-dual method which exploits the structure of extensive-form games, yielding significantly faster computation for both game solving and gradient computation. When tested on randomly generated games, we report speedups of orders of magnitude over previous approaches. We also demonstrate the effectiveness of our model on both real-world one-player settings and synthetic data.

Author(s):  
Chun Kai Ling ◽  
Fei Fang ◽  
J. Zico Kolter

Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents.  This paper deals with the relatively under-explored but equally important "inverse" setting, where the parameters of the underlying game are not known to all agents, but must be learned through observations.  We propose a differentiable, end-to-end learning framework for addressing this task.  In particular, we consider a regularized version of the game, equivalent to a particular form of quantal response equilibrium, and develop 1) a primal-dual Newton method for finding such equilibrium points in both normal and extensive form games; and 2) a backpropagation method that lets us analytically compute gradients of all relevant game parameters through the solution itself.  This ultimately lets us learn the game by training in an end-to-end fashion, effectively by integrating a "differentiable game solver" into the loop of larger deep network architectures. We demonstrate the effectiveness of the learning method in several settings including poker and security game tasks.


2014 ◽  
Vol 51 ◽  
pp. 829-866 ◽  
Author(s):  
B. Bosansky ◽  
C. Kiekintveld ◽  
V. Lisy ◽  
M. Pechoucek

Developing scalable solution algorithms is one of the central problems in computational game theory. We present an iterative algorithm for computing an exact Nash equilibrium for two-player zero-sum extensive-form games with imperfect information. Our approach combines two key elements: (1) the compact sequence-form representation of extensive-form games and (2) the algorithmic framework of double-oracle methods. The main idea of our algorithm is to restrict the game by allowing the players to play only selected sequences of available actions. After solving the restricted game, new sequences are added by finding best responses to the current solution using fast algorithms. We experimentally evaluate our algorithm on a set of games inspired by patrolling scenarios, board, and card games. The results show significant runtime improvements in games admitting an equilibrium with small support, and substantial improvement in memory use even on games with large support. The improvement in memory use is particularly important because it allows our algorithm to solve much larger game instances than existing linear programming methods. Our main contributions include (1) a generic sequence-form double-oracle algorithm for solving zero-sum extensive-form games; (2) fast methods for maintaining a valid restricted game model when adding new sequences; (3) a search algorithm and pruning methods for computing best-response sequences; (4) theoretical guarantees about the convergence of the algorithm to a Nash equilibrium; (5) experimental analysis of our algorithm on several games, including an approximate version of the algorithm.


2021 ◽  
Vol 8 (2) ◽  
pp. 273-287
Author(s):  
Xuewei Bian ◽  
Chaoqun Wang ◽  
Weize Quan ◽  
Juntao Ye ◽  
Xiaopeng Zhang ◽  
...  

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.


2011 ◽  
pp. 2206-2249
Author(s):  
Aidan Hogan ◽  
Andreas Harth ◽  
Axel Polleres

In this article the authors discuss the challenges of performing reasoning on large scale RDF datasets from the Web. Using ter-Horst’s pD* fragment of OWL as a base, the authors compose a rulebased framework for application to web data: they argue their decisions using observations of undesirable examples taken directly from the Web. The authors further temper their OWL fragment through consideration of “authoritative stheirces” which counter-acts an observed behavitheir which we term “ontology hijacking”: new ontologies published on the Web re-defining the semantics of existing entities resident in other ontologies. They then present their system for performing rule-based forward-chaining reasoning which they call SAOR: Scalable Authoritative OWL Reasoner. Based upon observed characteristics of web data and reasoning in general, they design their system to scale: the system is based upon a separation of terminological data from assertional data and comprises of a lightweight in-memory index, on-disk sorts and file-scans. The authors evaluate their methods on a dataset in the order of a hundred million statements collected from real-world Web stheirces and present scale-up experiments on a dataset in the order of a billion statements collected from the Web.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S358-S358
Author(s):  
David L Bostick ◽  
Kalvin Yu ◽  
Cynthia Yamaga ◽  
Ann Liu-Ferrara ◽  
Didier Morel ◽  
...  

Abstract Background Large scale research on antimicrobial usage in real-world populations traditionally does not consist of infusion data. With automation, detailed infusion events are captured in device systems, providing opportunities to harness them for patient safety studies. However, due to the unstructured nature of infusion data, the scale-up of data ingestion, cleansing, and processing is challenging. Figure 1. Illustration of dosing complexity Methods We applied algorithmic techniques to quantitate and visualize vancomycin administration data captured in real-time by automated infusion devices from 3 acute care hospitals. The device data included timestamped infusion events – infusion started, paused, restarted, alarmed, and stopped. We used time density-based segmentation algorithms to depict infusion sessions as bursts of event activity. We examined clinical interpretability of the cluster-defined sessions in defining infusion events, dosing intensity, and duration. Results The algorithms identified 13,339 vancomycin infusion sessions from 2,417 unique patients (mean = 5.5 sessions per patient). Clustering captured vancomycin infusion sessions consistently with correct event labels in >98% of cases. It disentangled ambiguity associated with unexpected events (e.g. multiple stopped/started events within a single infusion session). Segmentation of vancomycin infusion events on an example patient timeline is illustrated in Figure 1. The median duration of infusion sessions was 1.55 (1st, 3rd quartiles: 1.14, 2.02) hours, demonstrating clinical plausibility. Conclusion Passively captured vancomycin administration data from automated infusion device systems provide ramifications for real-time bed-side patient care practice. With large volume of data, temporal event segmentation can be an efficient approach to generate clinically interpretable insights. This method scales up accuracy and consistency in handling longitudinal dosing data. It can enable real-time population surveillance and patient-specific clinical decision support for large patient populations. Better understanding of infusion data may also have implications for vancomycin pharmacokinetic dosing. Disclosures David L. Bostick, PhD, Becton, Dickinson and Co. (Employee) Kalvin Yu, MD, Becton, Dickinson and Company (Employee)GlaxoSmithKline plc. (Other Financial or Material Support, Funding) Cynthia Yamaga, PharmD, BD (Employee) Ann Liu-Ferrara, PhD, Becton, Dickinson and Co. (Employee) Didier Morel, PhD, Becton, Dickinson and Co. (Employee) Ying P. Tabak, PhD, Becton, Dickinson and Co. (Employee)


Author(s):  
Jiri Cermak ◽  
Branislav Bošanský ◽  
Viliam Lisý

We solve large two-player zero-sum extensive-form games with perfect recall. We propose a new algorithm based on fictitious play that significantly reduces memory requirements for storing average strategies. The key feature is exploiting imperfect recall abstractions while preserving the convergence rate and guarantees of fictitious play applied directly to the perfect recall game. The algorithm creates a coarse imperfect recall abstraction of the perfect recall game and automatically refines its information set structure only where the imperfect recall might cause problems. Experimental evaluation shows that our novel algorithm is able to solve a simplified poker game with 7.10^5 information sets using an abstracted game with only 1.8% of information sets of the original game. Additional experiments on poker and randomly generated games suggest that the relative size of the abstraction decreases as the size of the solved games increases.


2018 ◽  
Vol 47 (6) ◽  
pp. 2005-2014 ◽  
Author(s):  
Yuxi Tian ◽  
Martijn J Schuemie ◽  
Marc A Suchard

2019 ◽  
Vol 9 (20) ◽  
pp. 4291 ◽  
Author(s):  
Mahammad Humayoo ◽  
Xueqi Cheng

Regularization is a popular technique in machine learning for model estimation and for avoiding overfitting. Prior studies have found that modern ordered regularization can be more effective in handling highly correlated, high-dimensional data than traditional regularization. The reason stems from the fact that the ordered regularization can reject irrelevant variables and yield an accurate estimation of the parameters. How to scale up the ordered regularization problems when facing large-scale training data remains an unanswered question. This paper explores the problem of parameter estimation with the ordered ℓ 2 -regularization via Alternating Direction Method of Multipliers (ADMM), called ADMM-O ℓ 2 . The advantages of ADMM-O ℓ 2 include (i) scaling up the ordered ℓ 2 to a large-scale dataset, (ii) predicting parameters correctly by excluding irrelevant variables automatically, and (iii) having a fast convergence rate. Experimental results on both synthetic data and real data indicate that ADMM-O ℓ 2 can perform better than or comparable to several state-of-the-art baselines.


Sign in / Sign up

Export Citation Format

Share Document