scholarly journals Rule based strategies for large extensive-form games: A specification language for No-Limit Texas Hold’em agents

2014 ◽  
Vol 11 (4) ◽  
pp. 1249-1269
Author(s):  
Luís Teófilo ◽  
Luís Reis ◽  
Henrique Cardoso ◽  
Pedro Mendes

Poker is used to measure progresses in extensive-form games research due to its unique characteristics: it is a game where playing agents have to deal with incomplete information and stochastic scenarios and a large number of decision points. The development of Poker agents has seen significant advances in one-on-one matches but there are still no consistent results in multiplayer and in games against human experts. In order to allow for experts to aid the improvement of the agents? performance, we have created a high-level strategy specification language. To support strategy definition, we have also developed an intuitive graphical tool. Additionally, we have also created a strategy inferring system, based on a dynamically weighted Euclidean distance. This approach was validated through the creation of simple agents and by successfully inferring strategies from 10 human players. The created agents were able to beat previously developed mid-level agents by a good profit margin.

Author(s):  
Trevor Davis ◽  
Kevin Waugh ◽  
Michael Bowling

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zerosum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.


Author(s):  
Jiri Cermak ◽  
Branislav Bošanský ◽  
Viliam Lisý

We solve large two-player zero-sum extensive-form games with perfect recall. We propose a new algorithm based on fictitious play that significantly reduces memory requirements for storing average strategies. The key feature is exploiting imperfect recall abstractions while preserving the convergence rate and guarantees of fictitious play applied directly to the perfect recall game. The algorithm creates a coarse imperfect recall abstraction of the perfect recall game and automatically refines its information set structure only where the imperfect recall might cause problems. Experimental evaluation shows that our novel algorithm is able to solve a simplified poker game with 7.10^5 information sets using an abstracted game with only 1.8% of information sets of the original game. Additional experiments on poker and randomly generated games suggest that the relative size of the abstraction decreases as the size of the solved games increases.


Author(s):  
Jiří Čermák ◽  
Viliam Lisý ◽  
Branislav Bošanský

Information abstraction is one of the methods for tackling large extensive-form games (EFGs). Removing some information available to players reduces the memory required for computing and storing strategies. We present novel domain-independent abstraction methods for creating very coarse abstractions of EFGs that still compute strategies that are (near) optimal in the original game. First, the methods start with an arbitrary abstraction of the original game (domain-specific or the coarsest possible). Next, they iteratively detect which information is required in the abstract game so that a (near) optimal strategy in the original game can be found and include this information into the abstract game. Moreover, the methods are able to exploit imperfect-recall abstractions where players can even forget the history of their own actions. We present two algorithms that follow these steps -- FPIRA, based on fictitious play, and CFR+IRA, based on counterfactual regret minimization. The experimental evaluation confirms that our methods can closely approximate Nash equilibrium of large games using abstraction with only 0.9% of information sets of the original game.


1994 ◽  
Vol 7 (3) ◽  
pp. 309-317 ◽  
Author(s):  
Jacques Crémer

2011 ◽  
Vol 52 (1) ◽  
pp. 75-102 ◽  
Author(s):  
Carlos Alós-Ferrer ◽  
Klaus Ritzberger

2021 ◽  
Vol 31 (3) ◽  
pp. 1-26
Author(s):  
Aravind Balakrishnan ◽  
Jaeyoung Lee ◽  
Ashish Gaurav ◽  
Krzysztof Czarnecki ◽  
Sean Sedwards

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.


2021 ◽  
Vol 20 (01) ◽  
pp. 2150013
Author(s):  
Mohammed Abu-Arqoub ◽  
Wael Hadi ◽  
Abdelraouf Ishtaiwi

Associative Classification (AC) classifiers are of substantial interest due to their ability to be utilised for mining vast sets of rules. However, researchers over the decades have shown that a large number of these mined rules are trivial, irrelevant, redundant, and sometimes harmful, as they can cause decision-making bias. Accordingly, in our paper, we address these challenges and propose a new novel AC approach based on the RIPPER algorithm, which we refer to as ACRIPPER. Our new approach combines the strength of the RIPPER algorithm with the classical AC method, in order to achieve: (1) a reduction in the number of rules being mined, especially those rules that are largely insignificant; (2) a high level of integration among the confidence and support of the rules on one hand and the class imbalance level in the prediction phase on the other hand. Our experimental results, using 20 different well-known datasets, reveal that the proposed ACRIPPER significantly outperforms the well-known rule-based algorithms RIPPER and J48. Moreover, ACRIPPER significantly outperforms the current AC-based algorithms CBA, CMAR, ECBA, FACA, and ACPRISM. Finally, ACRIPPER is found to achieve the best average and ranking on the accuracy measure.


Sign in / Sign up

Export Citation Format

Share Document