MABWISER: Parallelizable Contextual Multi-armed Bandits
Contextual multi-armed bandit algorithms are an effective approach for online sequential decision-making problems. However, there are limited tools available to support their adoption in the community. To fill this gap, we present an open-source Python library with context-free, parametric and non-parametric contextual multi-armed bandit algorithms. The MABWiser library is designed to be user-friendly and supports custom bandit algorithms for specific applications. Our design provides built-in parallelization to speed up training and testing for scalability with special attention given to ensuring the reproducibility of results. The API makes hybrid strategies possible that combine non-parametric policies with parametric ones, an area that is not explored in the literature. As a practical application, we demonstrate using the library in both batch and online simulations for context-free, parametric and non-parametric contextual policies with the well-known MovieLens data set. Finally, we quantify the performance benefits of built-in parallelization.