Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

2020 ◽  
Vol 68 (6) ◽  
pp. 1678-1697
Author(s):  
Daniel R. Jiang ◽  
Lina Al-Kanj ◽  
Warren B. Powell

In the paper, “Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds,” the authors propose an extension to Monte Carlo tree search that uses the idea of “sampling the future” to produce noisy upper bounds on nodes in the decision tree. These upper bounds can help guide the tree expansion process and produce decision trees that are deeper rather than wider, in effect concentrating computation toward more useful parts of the state space. The algorithm’s effectiveness is illustrated in a ride-sharing setting, where a driver/vehicle needs to make dynamic decisions regarding trip acceptance and relocations.

Author(s):  
Yanqiu Cheng ◽  
Xianbiao Hu ◽  
Qing Tang ◽  
Hongsheng Qi ◽  
Hong Yang

A model-free approach is presented, based on the Monte Carlo tree search (MCTS) algorithm, for the control of mixed traffic flow of human-driven vehicles (HDV) and connected and autonomous vehicles (CAV), named MCTS-MTF, on a one-lane roadway with signalized intersection control. Previous research has often simplified the problem with certain assumptions to reduce computational burden, such as dividing a vehicle trajectory into several segments with constant speed or linear acceleration/deceleration, which was rather unrealistic. This study departs from the existing research in that minimum constraints on CAV trajectory control were required, as long as the basic rules such as safety considerations and vehicular performance limits were followed. Modeling efforts were made to improve the algorithm solution quality and the run time efficiency over the naïve MCTS algorithm. This was achieved by an exploration-exploitation balance calibration module, and a tree expansion determination module to expand the tree more effectively along the desired direction. Results of a case study found that the proposed algorithm was able to achieve a travel time saving of 3.5% and a fuel consumption saving of 6.5%. It was also demonstrated to run at eight times the speed of a naïve MCTS model, suggesting a promising potential for real-time or near real-time applications.


Sign in / Sign up

Export Citation Format

Share Document