Monte-Carlo Tree Search and Minimax Hybrids with Heuristic Evaluation Functions

Author(s):  
Hendrik Baier ◽  
Mark H. M. Winands
2018 ◽  
Vol 62 ◽  
pp. 193-231 ◽  
Author(s):  
Hendrik Baier ◽  
Mark H. M. Winands

Monte-Carlo Tree Search (MCTS) has been found to show weaker play than minimax-based search in some tactical game domains. This is partly due to its highly selective search and averaging value backups, which make it susceptible to traps. In order to combine the strategic strength of MCTS and the tactical strength of minimax, MCTS-minimax hybrids have been introduced, embedding shallow minimax searches into the MCTS framework. Their results have been promising even without making use of domain knowledge such as heuristic evaluation functions. This article continues this line of research for the case where evaluation functions are available. Three different approaches are considered, employing minimax with an evaluation function in the rollout phase of MCTS, as a replacement for the rollout phase, and as a node prior to bias move selection. The latter two approaches are newly proposed. Furthermore, all three hybrids are enhanced with the help of move ordering and k-best pruning for minimax. Results show that the use of enhanced minimax for computing node priors results in the strongest MCTS-minimax hybrid investigated in the three test domains of Othello, Breakthrough, and Catch the Lion. This hybrid, called MCTS-IP-M-k, also outperforms enhanced minimax as a standalone player in Breakthrough, demonstrating that at least in this domain, MCTS and minimax can be combined to an algorithm stronger than its parts. Using enhanced minimax for computing node priors is therefore a promising new technique for integrating domain knowledge into an MCTS framework.


ICGA Journal ◽  
2019 ◽  
Vol 40 (3) ◽  
pp. 294-304
Author(s):  
Kiminori Matsuzaki ◽  
Naoki Kitamura

Author(s):  
Hendrik Baier ◽  
Michael Kaisers

This paper addresses the challenge of online generalization in tree search. We propose Multiple Estimator Monte Carlo Tree Search (ME-MCTS), with a two-fold contribution: first, we introduce a formalization of online generalization that can represent existing techniques such as "history heuristics", "RAVE", or "OMA" -- contextual action value estimators or abstractors that generalize across specific contexts. Second, we incorporate recent advances in estimator averaging that enable guiding search by combining the online action value estimates of any number of such abstractors or similar types of action value estimators. Unlike previous work, which usually proposed a single abstractor for either the selection or the rollout phase of MCTS simulations, our approach focuses on the combination of multiple estimators and applies them to all move choices in MCTS simulations. As the MCTS tree itself is just another value estimator -- unbiased, but without abstraction -- this blurs the traditional distinction between action choices inside and outside of the MCTS tree. Experiments with three abstractors in four board games show significant improvements of ME-MCTS over MCTS using only a single abstractor, both for MCTS with random rollouts as well as for MCTS with static evaluation functions. While we used deterministic, fully observable games, ME-MCTS naturally extends to more challenging settings.


Author(s):  
Hendrik Baier ◽  
Mark H. M. Winands

Monte-Carlo Tree Search (MCTS) has been found to show weaker play than minimax-based search in some tactical game domains. In order to combine the tactical strength of minimax and the strategic strength of MCTS, MCTS-minimax hybrids have been proposed in prior work. This article continues this line of research for the case where heuristic state evaluation functions are available. Three different approaches are considered, employing minimax in the rollout phase of MCTS, as a replacement for the rollout phase, and as a node prior to bias move selection. The latter two approaches are newly proposed. Results show that the use of enhanced minimax for computing node priors results in the strongest MCTS-minimax hybrid in the three test domains of Othello, Breakthrough, and Catch the Lion. This hybrid also outperforms enhanced minimax as a standalone player in Breakthrough, demonstrating that at least in this domain, MCTS and minimax can be combined to an algorithm stronger than its parts.


2018 ◽  
Vol 42 (4) ◽  
pp. 1-5 ◽  
Author(s):  
Reed M. Milewicz ◽  
Simon Poulding

Sign in / Sign up

Export Citation Format

Share Document