scholarly journals Efficient Opponent Exploitation in No-Limit Texas Hold’em Poker: A Neuroevolutionary Method Combined with Reinforcement Learning

Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2087
Author(s):  
Jiahui Xu ◽  
Jing Chen ◽  
Shaofei Chen

In the development of artificial intelligence (AI), games have often served as benchmarks to promote remarkable breakthroughs in models and algorithms. No-limit Texas Hold’em (NLTH) is one of the most popular and challenging poker games. Despite numerous studies having been conducted on this subject, there are still some important problems that remain to be solved, such as opponent exploitation, which means to adaptively and effectively exploit specific opponent strategies; this is acknowledged as a vital issue especially in NLTH and many real-world scenarios. Previous researchers tried to use an off-policy reinforcement learning (RL) method to train agents that directly learn from historical strategy interactions but suffered from challenges of sparse rewards. Other researchers instead adopted neuroevolutionary (NE) method to replace RL for policy parameter updates but suffered from high sample complexity due to the large-scale problem of NLTH. In this work, we propose NE_RL, a novel method combing NE with RL for opponent exploitation in NLTH. Our method contains a hybrid framework that uses NE’s advantage of evolutionary computation with a long-term fitness metric to address the sparse rewards feedback in NLTH and retains RL’s gradient-based method for higher learning efficiency. Experimental results against multiple baseline opponents have proved the feasibility of our method with significant improvement compared to previous methods. We hope this paper provides an effective new approach for opponent exploitation in NLTH and other large-scale imperfect information games.

2020 ◽  
Vol 65 (2) ◽  
pp. 31
Author(s):  
T.V. Pricope

Many real-world applications can be described as large-scale games of imperfect information. This kind of games is particularly harder than the deterministic one as the search space is even more sizeable. In this paper, I want to explore the power of reinforcement learning in such an environment; that is why I take a look at one of the most popular game of such type, no limit Texas Hold’em Poker, yet unsolved, developing multiple agents with different learning paradigms and techniques and then comparing their respective performances. When applied to no-limit Hold’em Poker, deep reinforcement learning agents clearly outperform agents with a more traditional approach. Moreover, if these last agents rival a human beginner level of play, the ones based on reinforcement learning compare to an amateur human player. The main algorithm uses Fictitious Play in combination with ANNs and some handcrafted metrics. We also applied the main algorithm to another game of imperfect information, less complex than Poker, in order to show the scalability of this solution and the increase in performance when put neck in neck with established classical approaches from the reinforcement learning literature.


Author(s):  
Chunyi Wu ◽  
Gaochao Xu ◽  
Yan Ding ◽  
Jia Zhao

Large-scale tasks processing based on cloud computing has become crucial to big data analysis and disposal in recent years. Most previous work, generally, utilize the conventional methods and architectures for general scale tasks to achieve tons of tasks disposing, which is limited by the issues of computing capability, data transmission, etc. Based on this argument, a fat-tree structure-based approach called LTDR (Large-scale Tasks processing using Deep network model and Reinforcement learning) has been proposed in this work. Aiming at exploring the optimal task allocation scheme, a virtual network mapping algorithm based on deep convolutional neural network and [Formula: see text]-learning is presented herein. After feature extraction, we design and implement a policy network to make node mapping decisions. The link mapping scheme can be attained by the designed distributed value-function based reinforcement learning model. Eventually, tasks are allocated onto proper physical nodes and processed efficiently. Experimental results show that LTDR can significantly improve the utilization of physical resources and long-term revenue while satisfying task requirements in big data.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Navid Ghajarnia ◽  
Zahra Kalantari ◽  
René Orth ◽  
Georgia Destouni

AbstractSoil moisture is an important variable for land-climate and hydrological interactions. To investigate emergent large-scale, long-term interactions between soil moisture and other key hydro-climatic variables (precipitation, actual evapotranspiration, runoff, temperature), we analyze monthly values and anomalies of these variables in 1378 hydrological catchments across Europe over the period 1980–2010. The study distinguishes results for the main European climate regions, and tests how sensitive or robust they are to the use of three alternative observational and re-analysis datasets. Robustly across the European climates and datasets, monthly soil moisture anomalies correlate well with runoff anomalies, and extreme soil moisture and runoff values also largely co-occur. For precipitation, evapotranspiration, and temperature, anomaly correlation and extreme value co-occurrence with soil moisture are overall lower than for runoff. The runoff results indicate a possible new approach to assessing variability and change of large-scale soil moisture conditions by use of long-term time series of monitored catchment-integrating stream discharges.


Author(s):  
M. Evans

The approaches traditionally used to quantify creep and creep fracture are critically assessed and reviewed in relation to a new approach proposed by Wilshire and Scharning. The characteristics, limitations, and predictive accuracies of these models are illustrated by reference to information openly available for the bainitic 1Cr–1Mo–0.25V steel. When applied to this comprehensive long-term data set, the estimated 100,000–300,000 h strength obtained from the older so called traditional methods varied considerably. Further, the isothermal predictions from these models became very unstable beyond 100,000 h. In contrast, normalizing the applied stress through an appropriate ultimate tensile strength value not only reduced the melt to melt scatter in rupture life, but also the 100,000 h strengths determined from this model for this large scale test program are predicted very accurately by extrapolation of creep life measurements lasting less than 5000 h. The approach therefore offers the potential for reducing the scale and cost of current procedures for acquisition of long-term engineering design data.


Author(s):  
Daochen Zha ◽  
Kwei-Herng Lai ◽  
Songyi Huang ◽  
Yuanpu Cao ◽  
Keerthana Reddy ◽  
...  

We present RLCard, a Python platform for reinforcement learning research and development in card games. RLCard supports various card environments and several baseline algorithms with unified easy-to-use interfaces, aiming at bridging reinforcement learning and imperfect information games. The platform provides flexible configurations of state representation, action encoding, and reward design. RLCard also supports visualizations for algorithm debugging. In this demo, we showcase two representative environments and their visualization results. We conclude this demo with challenges and research opportunities brought by RLCard. A video is available on YouTube.


Author(s):  
Teddy Lazebnik ◽  
Svetlana Bunimovich-Mendrazitsky ◽  
Leonid Shaikhet

We present a new analytical method to find the asymptotic stable equilibria states based on the Markov chain technique. We reveal this method on the SIR-type epidemiological model that we developed for viral diseases with long-term immunity memory pandemic. This is a large-scale model containing 15 nonlinear ODE equations, and classical methods have failed to analytically obtain its equilibria. The proposed method is used to conduct a comprehensive analysis by a stochastic representation of the dynamics of the model, followed by finding all asymptotic stable equilibrium states of the model for any values of parameters and initial conditions.


Author(s):  
Kai Liu ◽  
Hua Wang ◽  
Fei Han ◽  
Hao Zhang

Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.


Symmetry ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1120
Author(s):  
Teddy Lazebnik ◽  
Svetlana Bunimovich-Mendrazitsky ◽  
Leonid Shaikhet

We present a new analytical method to find the asymptotic stable equilibria states based on the Markov chain technique. We reveal this method on the Susceptible-Infectious-Recovered (SIR)-type epidemiological model that we developed for viral diseases with long-term immunity memory. This is a large-scale model containing 15 nonlinear ordinary differential equations (ODEs), and classical methods have failed to analytically obtain its equilibria. The proposed method is used to conduct a comprehensive analysis by a stochastic representation of the dynamics of the model, followed by finding all asymptotic stable equilibrium states of the model for any values of parameters and initial conditions thanks to the symmetry of the population size over time.


10.29007/xtgm ◽  
2018 ◽  
Author(s):  
Yuezhang Li ◽  
Katia Sycara ◽  
Rahul Iyer

Deep reinforcement learning has become popular over recent years, showing superiority on different visual-input tasks such as playing Atari games and robot navigation. Although objects are important image elements, few work considers enhancing deep reinforcement learning with object characteristics. In this paper, we propose a novel method that can incorporate object recognition processing to deep reinforcement learning models. This approach can be adapted to any existing deep reinforcement learning frameworks. State-of-the-art results are shown in experiments on Atari games. We also propose a new approach called “object saliency maps” to visually explain the actions made by deep reinforcement learning agents.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3332
Author(s):  
Wenzhen Huang ◽  
Qiyue Yin ◽  
Junge Zhang ◽  
Kaiqi Huang

StarCraft is a real-time strategy game that provides a complex environment for AI research. Macromanagement, i.e., selecting appropriate units to build depending on the current state, is one of the most important problems in this game. To reduce the requirements for expert knowledge and enhance the coordination of the systematic bot, we select reinforcement learning (RL) to tackle the problem of macromanagement. We propose a novel deep RL method, Mean Asynchronous Advantage Actor-Critic (MA3C), which computes the approximate expected policy gradient instead of the gradient of sampled action to reduce the variance of the gradient, and encode the history queue with recurrent neural network to tackle the problem of imperfect information. The experimental results show that MA3C achieves a very high rate of winning, approximately 90%, against the weaker opponents and it improves the win rate about 30% against the stronger opponents. We also propose a novel method to visualize and interpret the policy learned by MA3C. Combined with the visualized results and the snapshots of games, we find that the learned macromanagement not only adapts to the game rules and the policy of the opponent bot, but also cooperates well with the other modules of MA3C-Bot.


Sign in / Sign up

Export Citation Format

Share Document