ME-MCTS: Online Generalization by Combining Multiple Value Estimators

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/555 ◽

2021 ◽

Author(s):

Hendrik Baier ◽

Michael Kaisers

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Board Games ◽

Evaluation Functions ◽

Recent Advances ◽

Action Value ◽

Multiple Value ◽

Static Evaluation

This paper addresses the challenge of online generalization in tree search. We propose Multiple Estimator Monte Carlo Tree Search (ME-MCTS), with a two-fold contribution: first, we introduce a formalization of online generalization that can represent existing techniques such as "history heuristics", "RAVE", or "OMA" -- contextual action value estimators or abstractors that generalize across specific contexts. Second, we incorporate recent advances in estimator averaging that enable guiding search by combining the online action value estimates of any number of such abstractors or similar types of action value estimators. Unlike previous work, which usually proposed a single abstractor for either the selection or the rollout phase of MCTS simulations, our approach focuses on the combination of multiple estimators and applies them to all move choices in MCTS simulations. As the MCTS tree itself is just another value estimator -- unbiased, but without abstraction -- this blurs the traditional distinction between action choices inside and outside of the MCTS tree. Experiments with three abstractors in four board games show significant improvements of ME-MCTS over MCTS using only a single abstractor, both for MCTS with random rollouts as well as for MCTS with static evaluation functions. While we used deterministic, fully observable games, ME-MCTS naturally extends to more challenging settings.

Download Full-text

Do evaluation functions really improve Monte-Carlo tree search?

ICGA Journal ◽

10.3233/icg-180060 ◽

2019 ◽

Vol 40 (3) ◽

pp. 294-304

Author(s):

Kiminori Matsuzaki ◽

Naoki Kitamura

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Evaluation Functions

Download Full-text

Using evaluation functions in Monte-Carlo Tree Search

Theoretical Computer Science ◽

10.1016/j.tcs.2016.06.026 ◽

2016 ◽

Vol 644 ◽

pp. 106-113 ◽

Cited By ~ 2

Author(s):

Richard Lorentz

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Evaluation Functions

Download Full-text

Monte-Carlo tree search and rapid action value estimation in computer Go

Artificial Intelligence ◽

10.1016/j.artint.2011.03.007 ◽

2011 ◽

Vol 175 (11) ◽

pp. 1856-1875 ◽

Cited By ~ 131

Author(s):

Sylvain Gelly ◽

David Silver

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Computer Go ◽

Rapid Action ◽

Value Estimation ◽

Action Value

Download Full-text

Three-Head Neural Network Architecture for Monte Carlo Tree Search

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/523 ◽

2018 ◽

Cited By ~ 2

Author(s):

Chao Gao ◽

Martin Müller ◽

Ryan Hayward

Keyword(s):

Monte Carlo ◽

Network Architecture ◽

Data Augmentation ◽

The State ◽

Neural Nets ◽

Tree Search ◽

Neural Net ◽

Neural Network Architecture ◽

Monte Carlo Tree Search ◽

Action Value

AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a two-head architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.

Download Full-text

Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search

IEEE Transactions on Computational Intelligence and AI in Games ◽

10.1109/tciaig.2010.2083662 ◽

2010 ◽

Vol 2 (4) ◽

pp. 259-270 ◽

Cited By ~ 55

Author(s):

Markus Enzenberger ◽

Martin Muller ◽

Broderick Arneson ◽

Richard Segal

Keyword(s):

Monte Carlo ◽

Open Source ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Board Games ◽

Open Source Framework

Download Full-text

MCTS-Minimax Hybrids with State Evaluations

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11208 ◽

2018 ◽

Vol 62 ◽

pp. 193-231 ◽

Cited By ~ 1

Author(s):

Hendrik Baier ◽

Mark H. M. Winands

Keyword(s):

Monte Carlo ◽

Domain Knowledge ◽

New Technique ◽

Tree Search ◽

Evaluation Function ◽

Heuristic Evaluation ◽

Monte Carlo Tree Search ◽

Evaluation Functions ◽

Selective Search ◽

Computing Node

Monte-Carlo Tree Search (MCTS) has been found to show weaker play than minimax-based search in some tactical game domains. This is partly due to its highly selective search and averaging value backups, which make it susceptible to traps. In order to combine the strategic strength of MCTS and the tactical strength of minimax, MCTS-minimax hybrids have been introduced, embedding shallow minimax searches into the MCTS framework. Their results have been promising even without making use of domain knowledge such as heuristic evaluation functions. This article continues this line of research for the case where evaluation functions are available. Three different approaches are considered, employing minimax with an evaluation function in the rollout phase of MCTS, as a replacement for the rollout phase, and as a node prior to bias move selection. The latter two approaches are newly proposed. Furthermore, all three hybrids are enhanced with the help of move ordering and k-best pruning for minimax. Results show that the use of enhanced minimax for computing node priors results in the strongest MCTS-minimax hybrid investigated in the three test domains of Othello, Breakthrough, and Catch the Lion. This hybrid, called MCTS-IP-M-k, also outperforms enhanced minimax as a standalone player in Breakthrough, demonstrating that at least in this domain, MCTS and minimax can be combined to an algorithm stronger than its parts. Using enhanced minimax for computing node priors is therefore a promising new technique for integrating domain knowledge into an MCTS framework.

Download Full-text

On Monte Carlo Tree Search and Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.5507 ◽

2017 ◽

Vol 60 ◽

pp. 881-936 ◽

Cited By ~ 8

Author(s):

Tom Vodopivec ◽

Spyridon Samothrakis ◽

Branko Ster

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Video Game ◽

Close Relation ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Computer Go ◽

Board Games ◽

Planning Methods ◽

Unified View

Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search.

Download Full-text

Monte-Carlo Tree Search and Minimax Hybrids with Heuristic Evaluation Functions

Communications in Computer and Information Science - Computer Games ◽

10.1007/978-3-319-14923-3_4 ◽

2014 ◽

pp. 45-63 ◽

Cited By ~ 4

Author(s):

Hendrik Baier ◽

Mark H. M. Winands

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Heuristic Evaluation ◽

Monte Carlo Tree Search ◽

Evaluation Functions

Download Full-text

Monte-Carlo Tree Search in Board Games

Handbook of Digital Games and Entertainment Technologies ◽

10.1007/978-981-4560-52-8_27-1 ◽

2015 ◽

pp. 1-30 ◽

Cited By ~ 1

Author(s):

Mark H. M. Winands

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Board Games

Download Full-text

MCTS-Minimax Hybrids with State Evaluations (Extended Abstract)

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/782 ◽

2018 ◽

Author(s):

Hendrik Baier ◽

Mark H. M. Winands

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Prior Work ◽

Monte Carlo Tree Search ◽

Evaluation Functions ◽

State Evaluation ◽

Computing Node

Monte-Carlo Tree Search (MCTS) has been found to show weaker play than minimax-based search in some tactical game domains. In order to combine the tactical strength of minimax and the strategic strength of MCTS, MCTS-minimax hybrids have been proposed in prior work. This article continues this line of research for the case where heuristic state evaluation functions are available. Three different approaches are considered, employing minimax in the rollout phase of MCTS, as a replacement for the rollout phase, and as a node prior to bias move selection. The latter two approaches are newly proposed. Results show that the use of enhanced minimax for computing node priors results in the strongest MCTS-minimax hybrid in the three test domains of Othello, Breakthrough, and Catch the Lion. This hybrid also outperforms enhanced minimax as a standalone player in Breakthrough, demonstrating that at least in this domain, MCTS and minimax can be combined to an algorithm stronger than its parts.

Download Full-text